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METHOD AND SYSTEM FOR EFFICIENTLY MATCHING EVENTS 
WITH SUBSCRIBERS IN A CONTENT-BASED 
PUBLISH-SUBSCRIBE SYSTEM 



Background of the Invention 

The present invention relates generally to computer software, and 
more particularly, to a system and method for efficiently matching events 
with subscribers in a content based publish-subscribe system. 
5 The expansion of local and wide area computer networks has pushed 

computer technologies to a level that must be adaptive to a distributed 
environment. Computer applications can be concurrently running on 
different nodes in a large scale network, and in this environment, a coherent 
multi-event management system can create synergistic results and is an 

10 essential element to the networked computers. It is known in the art that a 
publish-subscribe paradigm is one of simple and efficient techniques to 
interconnect applications in a distributed environment. Information 
providers (publishers) publish information in the form of events in a publish- 
subscribe system, which delivers these events to the information consumers 

15 (subscribers). The system acts as an intermediary between the publishers 

and subscribers and is typically implemented as a network broker which is 
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responsible for routing events from publishers to subscribers. Most 
pub lish-sub scribe systems support some mechanisms by which subscribers 
can specify what kind of events they are interested in receiving. In such 
systems, each event is categorized as belonging to a particular group. 
5 Subscribers can then indicate the groups to which they want to subscribe. 

The publish-subscribe system ensures that subscribers are notified of events 
belonging to their respective groups. These systems are also known as group 
based systems. 

In addition to group based systems, there are content-based 
10 publish-subscribe systems. A content-based publish-subscribe system 
% allows a subscriber to control which events it wishes to be notified. Events 

P in such a system have various attributes and subscribers can specify 

ji arbitrary boolean predicates over these attributes. A subscriber is notified of 

Pi an event only if the predicates specified by the subscriber are satisfied. For 

•'15 example, a simple event for a stock quote could possibly have two attributes: 
□ the NAME and PRICE. A subscriber could specify the following predicate 

(NAME = "XYZ") AND (PRICE > 20). That is, this subscriber would be 
notified of the related event onfy if the NAME attribute of the event is "XYZ" 
'3 and its PRICE attribute is greater than 20. Compared to group based 

20 systems, content-based systems provide subscribers with great flexibility in 
choosing events for notification. A good example of a publish-subscribe 
system supporting content-based subscription is the Java Message Service, 
which is a messaging middleware standard that allows subscribers to specify 
SQL92 predicates over message attributes. 
25 Knowing all the advantages that content-based publish-subscribe 

systems have, an important problem in designing and implementing a 
content-based publish-subscribe system is an event-subscriber matching 
problem. In a networked environment, given an event and a set of 
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subscribers, the problem is to determine, as efficiently as possible, a subset of 
the subscribers that "match" with the event, i.e., those subscribers whose 
predetermined predicates are satisfied by the given published event. 

A conventional approach would be individually testing the event 
5 against the predetermined predicates specified by each subscriber one at a 
time until all the predicates are tested. Such an approach is a "linear" 
approach and would not be scalable. A large system may have thousands of 
subscribers and millions of events at any moment, and the time spent to 
match the events with the subscribers can be significant. 

10 Some experts in the industry suggest a solution to the matching 

problem, where subscriptions are organized into a matching tree, whose 
traversal yields a set of subscribers matching a particular event. See Marcos 
K. Aguilera, Robert E. Strom, Daniel C. Sturman, Mark Astley, and Tushar 
D. Chandra, Matching Events in a Content-Based Subscription System, 

15 Principles of Distributed Systems (1999). However in the Matching Events 

article, subscriptions are limited to conjunctions of atomic tests. The 
teaching of this article bases on the premise that any boolean predicate can 
be transformed into a disjunction of conjunctions. For example, a simple test 
(A OR B) 

20 can be transformed into 

(A AND B) OR (A AND NOT B) OR (NOT A AND B) 
For transforming an arbitrary boolean predicate into a correct form 
such as the above example, the process involved can be extensive and costly 
in terms of time and processing capacity. Moreover, a Directed Acyclic Graph 
25 (DAG) constructed for the original test can be expanded exponentially due to 
the increase of tests caused by the transformation. 

Furthermore, conventional binary decision diagrams and If-Then-Else 
DAGs primarily address the problem of finding an efficient representation for 



-3- 



Attorney Docket No.: 26530.16 (IDR~426) 

boolean expressions (including sub expressions), and they are widely used in 
design and verification of logic circuits. In applying these techniques for 
constructing DAGs^ it is more a bottom-up approach and the emphasis is on 
sharing all possible sub expressions or low level expressions. 
5 Although such a representation could be used to solve the matching 

problem, such an approach would still be linear. Moreover, sharing 
sub-predicates that are common prefixes is likely to result in sub-linear 
complexity. 

What is needed is an efficient method to solve the matching problem 
10 for subscriptions, which are arbitrary boolean predicates that can make use 
of standard boolean operators AND, OR and NOT and parenthesis, in a 
content-based publish-subscribe system situated in a distributed network 
environment. 

Summary of the Invention 

15 A method and system is provided for matching an event with a group 

of subscribers in a content -based publish-subscribe system in a distributed 
computer network environment. In one embodiment, each subscriber of the 
system is allowed to define one or more predetermined predicates or specified 
filters to screen the events it receives. These predicates define matching 

20 tests using standard boolean connectors such as AND, OR and NOT. 

Parenthesis can also be used to modify the order of these tests. A subscriber 
matches an event if the predicates supplied by the subscriber are all 
satisfied. 

In one example of the present invention, a suitable virtual Direct 
25 Acyclic Graph (DAG) is built based on the predicates of the subscribers. The 
DAG has a root node, one or more leaf nodes representing subscribers, and 
one or more non-leaf nodes representing the boolean tests. 



-4- 



Attorney Docket No.: 26530.16 aDR-426) 



Upon publishing an event, the event is considered as an input, and the 
DAG is traversed. One or more subscribers are eUminated if any of their 
predicates represented by the boolean tests are not satisfied while the DAG is 
traversed, and eventually, at least one matching subscriber is identified if all 
5 the predicates of the matching subscriber are satisfied. 

The DAG is built in such a fashion that commonly shared predicates 
among subscribers are tested first so that a minimum number of boolean 
tests are conducted for finding a matching subscriber. 



Brief Description of the Drawings 

';flO Fig. 1 is a computer for implementing one embodiment of the present 

0^ invention. 

yl Figs. 2-4 illustrate different Direct Acyclic Graphs for solving 

1-J matching problems in a content-based pubHsh-subscribe system. 

015 Detailed Description 

The present disclosure provides a method and system for efficiently 
% matching events with subscribers in a content-based publish-subscribe 

G system. This can be performed, for example, on a computer 100. 

Referring to Fig. 1, a computer graphics processing system 100 
20 includes a two-dimensional graphical display (also referred to as a "screen") 
102 and a central processing unit 104. The central processing unit 104 
contains a microprocessor and random access memory for storing programs. 
A disk drive 106 for loading programs may also be provided. A keyboard 108 
having a plurality of keys thereon is connected to the central processing unit 
25 104, and a pointing device such as a mouse 110 is also connected to the 

central processing unit 104. It will also be understood by those having skill 
in the art that one or more (including all) of the elements/steps of the present 
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invention may be implemented using software executing on a general 
purpose computer graphics processing system, using special purpose 
hardware-based computer graphics processing systems, or using 
combinations of special purpose hardware and software. 
5 In one example of the present invention, a virtual Directed Acyclic 

Graph (DAG) is first constructed for programming purpose. The DAG has 
one or more branches leading to one or more nodes, and each node in the 
DAG has a matching test to be performed. The nodes that do not have any 
branch directed away from them are end nodes. They are also referred to as 

10 leaf nodes, representing the subscribers. The DAG has a root node from 

which a matching process, which contains a series of matching tests, starts. 
An event, as an input, is evaluated or matched by starting the matching tests 
from the root node of the DAG and proceeding downward until each leaf node 
is reached. The conventional approach for constructing a DAG is done in a 

15 bottom-up fashion, which focuses on sharing sub-predicates that are common 
prefixes, and are likely to result in sub-linear complexity. The present 
invention introduces a top-down fashion for constructing the DAG suited for 
the matching problem in content-based publish-subscribe systems. 
At each non-leaf node, corresponding tests pertinent to it are 

20 evaluated, and depending on the result of the tests, the matching process 
continues through one or more outward branches. On reaching a leaf node, 
the corresponding subscriber is added to the list of "matched" subscribers for 
that event. In essence, the matching process preprocesses the subscription 
information into a suitably constructed DAG. Thereafter, a traversal of the 

25 DAG for a particular event yields the list of subscribers matching with that 
particular event. 

After the DAG is constructed, the root node usually is a dummy test 
that always produces a value of TRUE so that the matching process can start 
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to flow downward. Starting from the root node, the test at a non-leaf node is 
always executed. Each non-leaf node has branches directed outward which 
are labeled with one of T, F, T^, or F^j,. T denotes branches to be followed if 
the test evaluates to a logic value of TRUE, F denotes branches to be 
5 followed if the test evaluates to a logic value of FALSE, denotes branches 
to be followed if the test evaluates to either TRUE or NULL. denotes 
branches to be followed if the test evaluates to either FALSE or NULL. 
These labels denote which branches are to be followed depending on the 
outcome of the test performed at the node. Thus if the test evaluates to be 

10 TRUE, all branches labeled with T and T^j^ are followed. If the test evaluates 
to FALSE, all branches labeled with F and are followed. If the test 
evaluates to NULL, all branches labeled with T^j, and F^ are followed. When 
a leaf node is reached, the corresponding subscriber is matched. 

The non-leaf nodes representing atomic tests that can be evaluated 

15 against an event are formed by using standard boolean operators AND, OR 
and NOT. For example, (NAME = 'NOVELL') and (PRICE > 20) are sample 
elementary tests, each rendering a single logic result. The result can be 
TRUE, FALSE, or NULL. A test may evaluate to a value of NULL if for 
some reason the test cannot be evaluated against a particular event. For 

20 example, a particular event may not contain any attribute called NAME in 
which case the test (NAME = 'NOVELL') evaluates to NULL. Furthermore, 
parenthesis can be used to modify the order of evaluations of the predicates. 
For instance, the above mentioned two elementary tests can be combined 
with boolean operator AND to yield the predicate (NAME = 'NOVELL') AND 

25 (PRICE > 20) to form a more complex test. Table 1-3 as shown below 

illustrate predefined logical test results when standard boolean operator 
AND, OR and NOT are used. 
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AND 


true 


false 


null 


true 


true 


false 


null 


false 


false 


false 


false 


null 


null 


false 


null 



Table 1 



OR 


true 


false 


null 


true 


true 


true 


true 


false 


true 


false 


null 


null 


true 


null 


null 



Table 2 



NOT 




true 


false 


false 


true 


null 


null 



Table 3 

A subscriber therefore matches an event if the predicate supplied by 
the subscriber evaluates to a value of TRUE for that event. 

An appropriate DAG is important for a successful matching on the 
content-based publish-subscribe system. The DAG should be constructed so 
that during a traversal of the DAG for a particular event, only those leaf 
nodes which correspond to subscribers and match that event are reached. An 
important idea behind the construction of the DAG is to exploit common tests 
and sub-predicates among the subscribers. The DAG is constructed such 
that, for subscribers with a predicate as a common prefix, the predicate is 
evaluated in minimum occurrences (if not once) for all subscriptions having 
the sam epredicate. The benefit of such a DAG is that with the shared 
prefixes, a test performed at each node effectively eliminates some subgroup 
of the subscribers under test. That is, starting from the root, each test 
performed successively "prunes" a subset of subscribers eligible for matching 
until only the subscribers that match exactly with the event are reached. 
Therefore, this technique greatly improves upon the conventional approach of 
individually matching subscribers with events. 
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Referring to Fig. 2, one example of a DAG 10 is shown, illustrating a 
matching process for assisting a subscriber to match an event based on its 
predetermined predicates. In this case, a single subscriber SI has the 
predicate (A AND B AND C), where A and B are elementary tests. Starting 
5 at a root node 14, which is a dummy test that always evaluates to a value of 
TRUE, the matching process proceeds to a node 16 where test A is evaluated. 
If it evaluates a value of FALSE or NULL, the matching process stops. If it 
evaluates a value of TRUE, it proceeds further to a node 18 with test B. 
Similarly, after B is evaluated and if the outcome is still TRUE, the process 

10 proceeds to a node 20. If that test still renders a TRUE value, the leaf node 
12 representing the subscriber S^ 12 is reached and the event is matched. On 
the contrary, if any test of the node 16, 18, or 20 does not evaluate to a TRUE 
value, the matching process stops at that node and does not reach the leaf 
node 12 for S^ for this particular event. 

15 Referring now to Fig. 3, another DAG 22 matches an event with a 

subscriber S2. In this example, the subscriber S2 23 has predicates (A OR B 
OR C), where A, B and C are atomic tests. From a root node 24, the node 26 
is first reached for conducting test A. If the evaluation renders a value of 
TRUE, the matching process proceeds straight to a leaf node 23, which 

20 indicates that S2 is matched with the event. Otherwise, if a value of FALSE 
or NULL is reached in node 26, the matching process arrives at node 28 to 
execute test B. Again, if the test renders a TRUE value, S2 is once more 
matched with the event. However, if the node 28 produces a result oi FALSE 
or NULL, a node 30 representing test C is further reached. At that node, if a 

25 value of TRUE is obtained, the matching process can reach node 23 and S2 is 

found to be a matching subscriber. It is noted that although this particular 
DAG 22 is constructed in such a way that test A, B and C are evaluated 
sequentially, the position of these tests are interchangeable. 
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Fig. 4 illustrates a more complicated DAG 32 where subscribers SI, 
S2, and S3 have different subscription predicates, some portion of which are 
commonly shared. More specifically, Si has a predicate of (A AND B AND 
NOT C), S2 has a predicate of (NOT A OR D AND E), and S3 has a predicate 
5 of (A AND B AND (C OR D)). In order to construct an optimal DAG, common 
tests and sub -predicates must be exploited for constructing the DAG. For SI, 
S2 and S3, test A is a common prefix for all three subscriptions, and it should 
be placed right after a root node 34. Hence, a node 36 represents test A 
immediately after the root node 34. Similarly, predicate (A AND B) is shared 

10 by Si and S3, so test B should be placed immediately after the node 36 at 
node 38. Consequently, predicate (A AND B) is only evaluated once in the 
process for matching both SI and S3. The obvious benefit of this method is 
that the test performed at each node is used to try and effectively elminate 
some fraction of the subscribers. For example, if test A in DAG 32 evaluates 

15 to a FALSE, both SI and S3 are eliminated immediately without further 

processing. In a fashion similar to the processes as described in Figs. 2 and 
3, SI (node 44) is matched with the event when node 40 representing test C 
gives a FALSE value, and S3 (node 46) is matched if either the node 40 
evaluates to a TRUE or the node 40 evaluates to a FALSE or NULL and the 

20 node 42 further evaluates to a TRUE. S2 (node 48), is matched if test A 

renders a FALSE, or through a longer path that traverses nodes 50 and 52. 

The above examples are straightforward with only a few subscribers 
and simple predicates, but the technique holds valid for a large number of 
subscribers with arbitrarily complex predicates as well. This approach 

25 significantly improves upon the conventional matching process which is 
designed to traverse each subscription for match indivial subscriber. The 
actual algorithmic details of constructing the DAG are further explained 
below in the context of a computer program. 
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Subscriptions are represented by a DAG G = (V, E) where V is the set 
of vertices (nodes) and E is the set of edges of the DAG. Each internal node 
"u" represents a test "u.test" to be performed on an event and each leaf node 
u represents a subscriber "u.sub." Each edge e e E is of the form (u, r, v) 
5 where u, v e V, and r e {T, F, T^, F^} is a label associated with that edge. The 
edge is directed from u to v. During traversal, the node v should be visited 
depending on the result of the test u.test. Edges labeled T lead to subscribers 
that potentially match if the test evaluates TRUE. Edges labeled lead to 
subscribers that potentially match if the test evaluates to TRUE or NULL. 
10 Edges labeled F lead to subscribers that potentially match if the test 

evaluates to FALSE. Edges labeled F^ lead to subscribers that potentially 
match if the test evaluates FALSE or NULL. 

DAG Creation 

The root of the DAG, represented by "G.root" in the following section of 
15 the computer program, is a node which represents a dummy test that always 

returns TRUE when evaluated against any event. When the DAG is initially 
created with no subscribers, the root of the DAG is created with the dummy 
test. Therefore, the sample computer code for creating a DAG "G" is as 
follows: 

20 CreateDAG(G) 

G.root ~ new Internal Node 
Croot.test -DummyTest 
V={G.root} 
E = {} 
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DAG Traversal 

The visit(u, event) function in the following section of the computer 
program is a recursive function that visits a node u in the DAG for a 
particular event. On reaching a leaf node, the subscriber represented by that 
5 leaf node is matched and processed. On reaching an internal node, the test 

at that node is evaluated against the event. If the test evaluates TRUE, 
edges labeled T and T^^ are followed. If the test evaluates FAL^SE, edges 
labeled F and are followed. If the test evaluates NULL, edges labeled T^j, 
and F^ are followed. The program is as follows: 
10 Visitfu. event) 

if (u is a leaf node) 

process u,suh which matches event 
else 

if (u.test(event) ~ true) 
15 V(u,Zv)eE 

visit(v, event) 
V(u,T^,v)€E 
visit(v, event) 

else 

20 if (u.test(event) ~ false) 

V(u,F, v)eE 

visit(v, event) 
V(u,F^,v)eE 

visit(vy event) 

25 else 

V(u,T^,v)eE 
visit(v, event) 



- 12- 



Attorney Docket No.: 26530.16 aDR-426) 



V(u,F^,v)€E 
visit(v, event) 

When an event "e" occurs, the following section of the computer 
program is invoked, which starts the matching of the event from the root of 
5 the DAG. 

MatchfG, event) 

visit(G,root, event) 
This results in a depth-first traversal of the DAG. Only those leaf 
nodes with subscribers matching the event are traversed. 

10 Creating the DAG from Subscriptions 

Construction of the DAG is done incrementally. New subscriptions are 
added onto an existing DAG as described below. 

A subscription is a boolean predicate on events. A predicate may be 
just an atomic test, a disjunction of other predicates (predicates connected by 
15 a logical OR), a conjunction of other predicates (predicates connected by a 

logical AND), or a negation (NOT) of a predicate. A predicate is added to the 
DAG by recursively adding the subpredicates it comprises of. The following 
function of the computer program accomplishes this task: 
ProcessPredicatefP, InSet) 
20 if (P is a conjunction) 

return ProcessConjunction(P, InSet) 
else 

if (P is a disjunction) 
return ProcessDisjunction(P, InSet) 
25 else 

if (P is a negation) 
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return ProcessNegation(P, InSet) 
else 

return ProcessAtomicTest(P, InSet, true) 
The above function takes a predicate P and a set InSet as parameters. 
5 Depending on whether P is a conjunction, disjunction, negation or an atomic 

test, it invokes the appropriate function. InSet may be viewed as the set of 
points in the DAG that can potentially be reached after a partial match of the 
subscription. To further determine whether the subscription matches or not, 
it is necessary to evaluate predicate P, and hence P must be added to the 
10 DAG at each point in InSet. The function returns two sets of points in the 

DAG: TSet and Set, Assuming that the matching of an event has reached 
some point in InSet, TSet is the set of points in the DAG that can potentially 
be reached, depending on the event, if and only if predicate P evaluates to 
TRUE, Similarly, assuming that the matching of an event has reached some 
15 point in InSet, Set is the set of points in the DAG that can potentially be 
reached, depending on the event, if and only if predicate P does not evaluate 
to TRUE i.e. predicate P evaluates to either FALSE or NULL, To formalize 
the notion of "point" in the DAG, it consists of a pair (u, r) where u e V and r 
e {T, F, T^, F^i,}. During the matching of an event, (u, r) is reached if V{m, r, v) 
20 e E, the node v is visited. 

The following function ProcessConjunction(C, InSet) adds a 
conjunction C to all points in the set InSet: 
ProcessConiunctionfC. InSet) 

C C.AND C^AND ... AND */ 
25 TSet ={} 

F^ Set = {} 
TSeto = InSet 
i = l 
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while (i <— k) 

[TSeti, Set J ^ ProcessPredicate(Ci, TSet^J 

F^Set = F^SetuF^Set, 

i=i + 1 
5 TSet = TSe% 

return [TSet, F^ Set ] 

Conjunction C consists of sub-predicates C^, Cg, ... , C],. Each 
sub-predicate is recursively added to the DAG, starting with Cj at all points 
in InSet, Since a conjunction evaluates to TRUE, if and only if all 
mlO sub-predicates evaluate to TRUE, each sub-predicate Ci ( i > 1) is recursively 

j^^ added only at points in the DAG where C,_i evaluates to true i.e. TSet-^j. 

Therefore TSet is the set of points where all sub-predicates evaluate to 
FALSE or NULL i.e. TSet}, . Similarly, a conjunction evaluates to FALSE or 
NULL, if and only if one or more of its sub-predicates evaluates to FALSE or 
;"{l5 NULL. Therefore the set Set is the union of all the sets F^ Set^ . The 

concept is illustrated by Fig. 2, which represents the subscription A AND B 
a AND C. Note that B is added at the point where A is TRUE, and similarly C 

is added at the point where B is TRUE. In this case, A, B and C are atomic 
tests but the procedure is the same even if they are arbitrary predicates, 
20 except that they are recursively added. 

The following function ProcessDisjunction(DJnSet) adds a disjunction 
D to all points in the set InSet: 

ProcessDisiunction (D. InSet) 
/•^D^D,ORD,OR,„ ORD^-"/ 
25 TSet ^{} 

F^Set^O 
F^ Setn = InSet 
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while (i <- k) 

[ TSet^, Set J = ProcessPredicate(D-F^ Set -J 

TSet = TSet u TSet ■ 

i = i + 1 
F^Set=F^Set, 
return [TSet, F. Set J 



Disjunction D consists of sub - predicates D^, D2, ... , Dj,. Each 
sub-predicate is recursively added to the DAG, starting with D^ at all points 
in InSet. Since a disjunction evaluates to FALSE or NULL, if and only if all 
sub-predicates evaluate to FALSE or NULL, each sub-predicate D^ ( i > 1) is 
recursively added only at points in the DAG where D^.^ evaluates to FALSE 
or NULL, le. F^ Seti_^, Therefore F^ Set is the set of points where all 
sub-predicates evaluate to FALSE or NULL i.e. F^ Set^ . Similarly, a 
disjunction evaluates to TRUE, if and only if one or more of its 
sub-predicates evaluates to TRUE, Therefore the set TSet is the union of all 
the sets TSet^, Taking a DAG representing (A OR B OR C) as an example. B 
is added at the point where A is either FALSE or NULL, and C is added at 
the point where B is either FALSE or NULL. In this case, A, B and C are 
atomic tests, but the procedure is the same even if they are arbitrary 
predicates, except that that are be recursively added. 

The following function ProcessNegation(N, Inset) adds a negation to all 
points in InSet, It makes use of standard boolean identities to transform a 
negated conjunction into a disjunction (and vice versa) and calls the 
appropriate function: 

ProcessNemtionfN InSet) 
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if(N = NOTD where D = OR OR OR DJ 
C = (NOTD,) AND (NOTD,) AND ... AND (NOTDJ 
return ProcessConjunction(C, InSet) 

else 

5 if(N = NOT C where C = C, AND C.AND ... AND C, ; 

D = (NOT C,) OR (NOT C2)0R,„0R (NOT CJ 
return ProcessDisjunction(D, InSet) 
else 

if (N = NOT N' where N' = NOT P) 
10 return ProcessNegation(N' , InSet) 

else 

AT = NOT T where T is an atomic test */ 
return ProcessAtomicTest(T, InSet, false) 
The following function ProcessAtomicTest(test, InSet, result) 
15 adds an atomic test to all points in InSet It takes an additional 

parameter result, which is FALSE if the test is negated, and TRUE 
otherwise. 

ProcessAtomicTestdest, InSet. result) 

P = InSet 

20 Q = 

R = {} 

while (P is not empty) 
let (u, r) eP 

if (3v eV \ (u, r, v) eE and v.test = test) 
25 let P„ = {(u r ') : (u',r',v)eE} 

if(P.^P) 
R = R u{v } 
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else 

V ' — new Internal Node 
V '.test = test 
V=Vu{v'} 
5 V(v, s,w)eE 

E = Eu{(v', s, w) } 
V(u', rO eP^nP 

E = E \ ((u',r',v)}u{(u',r',vV 
R=Ru{v'} 
10 P = P\P, 

else 

P = P\{(u,r)} 
\ Q = Qu((u,r)} 

if (Qis not empty) 
il5 w = new Internal Node 

w.test = test 
t V=Vu{w} 

V(u,r)6Q 
E = E u { (u, r, w) } 
20 R=Ru{w} 

if (result = TRUE) 
Vv eR 
TSet = TSet u{(v, T)} 
F^Set=F^Setu{(v,F^} 

25 else 

Vv €R 
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TSet^TSetu{(v,F)} 
Set Set u { (v, } 

return [ TSet, F^ Set ] 

A new subscription is added to the DAG "G" by the following function. 
The predicate specified by the new subscriber is added to the DAG at the 
root. A leaf node corresponding to the new subscriber is added at all points 
in the DAG where the predicate evaluates to TRUE. 
AddSubscription(G, sub) 

let P = sub, predicate 

[ TSet, F^ Set ] = ProcessPredicate(P, { G.root, T} ) 

Create a new leaf node * / 
V ~ new Leaf Node 
v,sub - sub 
V= Vu{v} 
V(u, r) € TSet 

E-Eu{u,r,v} 

The above disclosure provides many different embodiments, or 
examples, for implementing different features of the invention. Specific 
examples of components, and processes are described to help clarify the 
invention. These are, of course, merely examples and are not intended to 
limit the invention from that described in the claims. All systems that 
support content-based subscription i.e. allowing subscribers to specify 
predicates over the content of events as subscription filters, would require 
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an efficient means for matching subscribers with events. These systems, for 
example, could include messaging systems like the Java Message Service, 
event notification services, semantic multicast systems. The above described 
technique could conceivably be used in all the above scenarios. While the 
invention has been particularly shown and described with reference to the 
preferred embodiment thereof, it will be understood by those skilled in the 
art that various changes in form and detail may be made therein without 
departing from the spirit and scope of the invention, as set forth in the 
following claims. 
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WHAT IS CLAIMED IS: 



1 1, A method for matching a pubHshed event with one or more 

2 subscribers in a content-based pubhsh-subscribe system in a computer 

3 network, each subscriber having one or more predetermined predicates, the 

4 method comprising: 

5 creating a virtual Direct Acyclic Graph (DAG) including one or more 

6 arbitrary boolean tests representing the predetermined predicates; 

7 ehminating, upon pubhshing the event, one or more subscribers, at 

8 least one of whose predicates is not satisfied while the DAG is traversed; 

9 and 

10 identifying at least one matching subscriber if all the predicates of the 

11 matching subscriber are satisfied, 

12 wherein the DAG has a root node, one or more leaf nodes representing 

13 subscribers, and one or more non-leaf nodes representing the boolean tests 

14 which are formed by boolean connectors. 

1 2. The method of claim 1 wherein the step of creating further 

2 includes constructing the DAG in a top-down fashion so that common 

3 predicates shared by the subscribers are examined first and a minimal 

4 number of boolean tests are conducted to identify the matching subscribers. 

1 3. The method of claim 2 further comprising adding new 

2 predicates of a new subscriber to the DAG recursively starting from the root 

3 node, and adding a leaf node at any node in the DAG where the boolean 

4 test at the node is satisfied. 
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1 4. The method of claim 1 wherein each non-leaf node directs 

2 toward other leaf nodes or non-leaf nodes based on the results of the boolean 

3 test at the non-leaf node. 

1 5. The method of claim 4 wherein the boolean test result is one of 

2 TRUE, FALSE, or NULL. 

1 6. The method of claim 1 wherein the boolean connectors are 

2 AND, OR, NOT and parenthesis. 

1 7. The method of claim 1 wherein the predetermined predicate 

2 includes an atomic test, a disjunction of sub predicates, a conjunction of sub 

3 predicates, or a negation of a sub predicates. 
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1 8. A computer program for matching a published event with one 

2 or more subscribers in a content-based pubhsh-subscribe system in a 

3 computer network, each subscriber having one or more predetermined 

4 predicates, the program comprising instructions for: 

5 creating in a top-down fashion a virtual Direct Acyclic Graph (DAG) 

6 including one or more arbitrary boolean tests representing the 

7 predetermined predicates so that common predicates shared by the 

8 subscribers are examined first and a minimum number of boolean tests are 

9 thus conducted to identify the matching subscriber; 

10 eliminating, upon publishing the event, one or more subscribers 

rill wherein at least one of whose predicates is not satisfied while the DAG is 

r^l2 travesed; and 

}|13 identifying at least one matching subscriber if all the predicates of the 

1^14 matching su^bscriber are satisfied, 

Hl5 wherein the DAG has a root node, one or more leaf nodes representing 

316 subscribers, and one or more non-leaf nodes representing the boolean tests 

,I17 formed by boolean connectors. 

3 1 9. The program of claim 8 wherein the instructions for creating 

2 further includes, when a new subscriber is added, adding the new predicates 

3 of the new subscriber to the DAG recursively starting from the root node, 

4 and adding a leaf node at any node in the DAG where the boolean test at 

5 the node is satisfied. 

1 10. The program of claim 8 wherein each non-leaf node directs 

2 toward other leaf nodes or non-leaf nodes based on the test result at the non- 

3 leaf node. 
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1 11. The program of claim 10 wherein the test result is one of 

2 TRUE, FALSE, or NULL. 

1 12. The program of claim 8 wherein the boolean connectors are 

2 AND, OR, NOT and parenthesis. 

1 13. The program of claim 8 wherein the predicate includes an 

2 atomic test, a disjunction of sub predicates, a conjunction of sub predicates, 

3 or a negation of a sub predicates. 
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METHOD AND SYSTEM FOR EFFICIENTLY MATCHING EVENTS 
WITH SUBSCRIBERS IN A CONTENT-BASED 
PUBLISH-SUBSCRIBE SYSTEM 

Abstract 

A method is provided for efficiently solving the matching problem in 
content-based publish-subscribe systems. Subscribers may define arbitrary 
boolean predicates as conditions to subscribe to the published event. The 
subscribers and their predicates can be organized in the form of a virtual 
Direct Acyclic Graph (DAG) such that a traversal of the DAG yields one or 
more matching subscribers. The present invention improves upon the 
conventional method of linearly matching individual subscribers against an 
event. 
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Figure 1 
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DECLARATION AND POWER OF ATTORNEY FOR 
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As below named inventors, we hereby declare that: 

Our residence, post office address and citizenship are as stated below next to our 
names; 

We believe we are the original, first and joint inventors of the subject matter which 
is claimed and for which a patent is sought on the invention entitled 

METHOD AND SYSTEM FOR EFFICIENTLY MATCHING 
EVENTS WITH SUBSCRIBERS IN A CONTENT-BASED 
PUBLISH-SUBSCRIBE SYSTEM 



the specification of which: (check one) 
XXX is attached hereto. 

was filed on 

under Attorney's Docket Number 

as Application Serial No. 

and was amended on (if applicable). 

We hereby state that we have reviewed and understand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above. 

We acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with 37 CFR 1.56. 

We hereby declare that all statements made herein of our own knowledge are true 
and that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under 18 USC 1001 and that such willful false statements may jeopardize the 
validity of the application or any patent issued thereon. 
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